Alpha distribution (alpha)#

The Alpha distribution in SciPy (scipy.stats.alpha) is a continuous distribution on \((0,\infty)\) built from a simple transformation of a truncated standard normal.

Despite the generic name, this is not “the alpha parameter” from other contexts (e.g., Dirichlet/Beta). It’s its own distribution with a distinctive \(1/x^2\) tail, which implies that the mean and variance are infinite.


Learning goals#

  • Write down the PDF/CDF and understand where they come from.

  • Build intuition via the truncated-normal generative story.

  • Understand which moments exist (and which diverge).

  • Sample from the distribution with a NumPy-only algorithm.

  • Use scipy.stats.alpha for evaluation and fitting.

import numpy as np

import plotly
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio

import scipy
from scipy import optimize
from scipy.stats import alpha as alpha_dist
from scipy.stats import norm

pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")

np.set_printoptions(precision=5, suppress=True)
rng = np.random.default_rng(42)

# Record versions for reproducibility (useful when numerical details matter).
VERSIONS = {"numpy": np.__version__, "scipy": scipy.__version__, "plotly": plotly.__version__}

1) Title & Classification#

  • Name: alpha (Alpha distribution; SciPy: scipy.stats.alpha)

  • Type: Continuous

  • Support (standard form): \(x \in (0,\infty)\)

  • Parameter space (standard form): shape \(a > 0\)

  • SciPy location/scale: loc \in \mathbb{R}, scale > 0 with $\(X = \text{loc} + \text{scale}\,Y, \qquad Y \sim \mathrm{Alpha}(a).\)$

Unless stated otherwise, this notebook works with the standard form (loc=0, scale=1).

2) Intuition & Motivation#

What it models#

A helpful way to think about the Alpha distribution is through its generative story:

  1. Draw a standard normal random variable \(Z \sim \mathcal{N}(0,1)\).

  2. Condition it to be below a threshold: \(Z \mid (Z \le a)\).

  3. Convert “distance to the threshold” into a positive quantity and take a reciprocal: $\(X = \frac{1}{a - Z}.\)$

If \(Z\) lands very close to \(a\) from below, then \((a-Z)\) is tiny and \(X\) becomes huge. That rare-but-possible event produces a heavy right tail.

Typical real-world use cases#

The Alpha distribution appears in the reliability literature as a model for positive quantities with occasional extreme values (e.g., lifetimes, repair times, stress/strength-like ratios after a transformation). It is most appropriate when:

  • values are strictly positive;

  • the right tail can be very long (outliers are not just noise);

  • you want a model with a clear “threshold proximity” interpretation via the truncated normal story.

Relations to other distributions#

  • Truncated normal: \(Z \mid (Z \le a)\) is the core latent variable.

  • Reciprocal transform: \(X\) is a reciprocal of \((a-Z)\).

  • Pareto-like tail: the Alpha PDF behaves like \(\text{const} \cdot x^{-2}\) for large \(x\), which implies \(\mathbb{P}(X>x) \approx \text{const} \cdot x^{-1}\). This is heavy enough that \(\mathbb{E}[X]\) diverges.

3) Formal Definition#

Let \(\phi\) and \(\Phi\) denote the standard normal PDF and CDF:

\[ \phi(z) = \frac{1}{\sqrt{2\pi}}\exp\left(-\tfrac{1}{2}z^2\right), \qquad \Phi(z) = \int_{-\infty}^z \phi(t)\,dt. \]

PDF#

For \(x>0\) and \(a>0\), the Alpha distribution has PDF

\[ f(x; a) = \frac{1}{x^2\,\Phi(a)}\,\phi\!\left(a - \frac{1}{x}\right). \]

CDF#

For \(x>0\),

\[ F(x; a) = \frac{\Phi\!\left(a - \frac{1}{x}\right)}{\Phi(a)}, \]

and \(F(x;a)=0\) for \(x\le 0\).

Quantile function (PPF)#

Solving \(q = F(x;a)\) for \(x\) gives

\[ F^{-1}(q;a) = \frac{1}{a - \Phi^{-1}\!\bigl(q\,\Phi(a)\bigr)} \qquad (0<q<1). \]

This matches SciPy’s internal implementation of alpha.ppf.

def alpha_pdf(x: np.ndarray, a: float) -> np.ndarray:
    """PDF of the standard Alpha(a) distribution (loc=0, scale=1)."""
    x = np.asarray(x, dtype=float)
    out = np.zeros_like(x, dtype=float)
    mask = x > 0
    xa = x[mask]
    out[mask] = norm.pdf(a - 1.0 / xa) / (xa**2 * norm.cdf(a))
    return out


def alpha_logpdf(x: np.ndarray, a: float) -> np.ndarray:
    """Log-PDF of the standard Alpha(a) distribution (more stable in the tail)."""
    x = np.asarray(x, dtype=float)
    out = np.full_like(x, -np.inf, dtype=float)
    mask = x > 0
    xa = x[mask]
    out[mask] = norm.logpdf(a - 1.0 / xa) - 2.0 * np.log(xa) - np.log(norm.cdf(a))
    return out


def alpha_cdf(x: np.ndarray, a: float) -> np.ndarray:
    """CDF of the standard Alpha(a) distribution."""
    x = np.asarray(x, dtype=float)
    out = np.zeros_like(x, dtype=float)
    mask = x > 0
    xa = x[mask]
    out[mask] = norm.cdf(a - 1.0 / xa) / norm.cdf(a)
    return out


def alpha_ppf(q: np.ndarray, a: float) -> np.ndarray:
    """Quantile function (inverse CDF) of the standard Alpha(a) distribution."""
    q = np.asarray(q, dtype=float)
    return 1.0 / (a - norm.ppf(q * norm.cdf(a)))
# Sanity check: our formulas match SciPy.
a = 1.7
x = np.logspace(-3, 2, 25)

assert np.allclose(alpha_pdf(x, a), alpha_dist.pdf(x, a))
assert np.allclose(alpha_cdf(x, a), alpha_dist.cdf(x, a))
assert np.allclose(alpha_ppf(np.linspace(0.01, 0.99, 9), a), alpha_dist.ppf(np.linspace(0.01, 0.99, 9), a))

4) Moments & Properties#

Tail behavior#

As \(x\to\infty\), we have \(a - 1/x \to a\), so

\[ f(x;a) = \frac{1}{x^2\,\Phi(a)}\,\phi\!\left(a - \frac{1}{x}\right) \sim \frac{\phi(a)}{\Phi(a)}\,\frac{1}{x^2}. \]

Consequently,

\[ \mathbb{P}(X>x) = \int_x^{\infty} f(t;a)\,dt \sim \frac{\phi(a)}{\Phi(a)}\,\frac{1}{x}. \]

This is a power-law tail with exponent 1 for the survival function.

Mean / variance / skewness / kurtosis#

Because the tail is so heavy:

  • \(\mathbb{E}[X] = \infty\) (diverges logarithmically)

  • \(\mathrm{Var}(X) = \infty\) (since \(\mathbb{E}[X^2] = \infty\))

  • skewness and kurtosis are undefined (they require finite third/fourth central moments)

More generally, the power-law tail implies:

  • \(\mathbb{E}[X^p]\) is finite for \(p < 1\)

  • \(\mathbb{E}[X^p]\) diverges for \(p \ge 1\)

MGF / characteristic function#

  • The moment generating function \(M_X(t)=\mathbb{E}[e^{tX}]\) does not exist for any \(t>0\).

  • The Laplace transform \(\mathbb{E}[e^{tX}]\) does exist for \(t<0\) (and can be computed numerically).

  • The characteristic function \(\varphi_X(t)=\mathbb{E}[e^{itX}]\) exists for all real \(t\) because \(|e^{itX}|\le 1\).

A useful integral representation comes from the truncated-normal story. If \(Z\sim\mathcal{N}(0,1)\) conditioned on \(Z\le a\) and \(X=1/(a-Z)\), then

\[ \varphi_X(t) = \mathbb{E}[e^{itX}] = \frac{1}{\Phi(a)}\int_{-\infty}^{a} \exp\!\left(\frac{it}{a-z}\right)\,\phi(z)\,dz. \]

Entropy#

The (differential) entropy is

\[ h(X) = -\int_0^{\infty} f(x;a)\,\log f(x;a)\,dx, \]

which is finite and available via scipy.stats.alpha.entropy.

a = 1.0

mean, var, skew, kurt = alpha_dist.stats(a, moments="mvsk")
entropy = alpha_dist.entropy(a)

print("SciPy stats (a=1.0):")
print("  mean   =", mean)
print("  var    =", var)
print("  skew   =", skew)
print("  kurt   =", kurt)
print("  entropy=", entropy)

qs = [0.5, 0.9, 0.99]
print("\nSelected quantiles:")
for q in qs:
    print(f"  q={q:>4}: {alpha_dist.ppf(q, a):.5f}")

# Empirical mean is unstable (finite for finite samples, but does not converge).
print("\nEmpirical summaries (same a, increasing n):")
for n in [200, 2_000, 20_000]:
    x = alpha_dist.rvs(a, size=n, random_state=rng)
    print(
        f"  n={n:>6}: mean={x.mean():.3f}, median={np.median(x):.3f}, 95%={np.quantile(x, 0.95):.3f}, max={x.max():.3f}"
    )
SciPy stats (a=1.0):
  mean   = inf
  var    = inf
  skew   = nan
  kurt   = nan
  entropy= 1.1728113403610725

Selected quantiles:
  q= 0.5: 0.83321
  q= 0.9: 3.30422
  q=0.99: 29.25150

Empirical summaries (same a, increasing n):
  n=   200: mean=1.584, median=0.803, 95%=4.951, max=38.214
  n=  2000: mean=3.025, median=0.824, 95%=6.602, max=497.999
  n= 20000: mean=5.053, median=0.841, 95%=6.204, max=16423.937

5) Parameter Interpretation#

The single shape parameter \(a>0\) plays two linked roles:

  1. It sets the truncation point for the latent normal: \(Z\mid(Z\le a)\).

  2. It appears in the reciprocal transform \(X = 1/(a-Z)\).

Intuitively:

  • Larger \(a\) increases the typical size of \((a-Z)\), so \(X\) tends to be smaller.

  • Smaller \(a\) makes it easier for \(Z\) to land near \(a\) (from below), producing more extreme \(X\) values.

The tail constant is proportional to \(\phi(a)/\Phi(a)\); this ratio decreases rapidly as \(a\) grows, so the far tail becomes less prominent for large \(a\) (even though the asymptotic power \(x^{-2}\) remains).

x = np.logspace(-3, 2, 600)
a_values = [0.5, 1.0, 2.0, 4.0]

fig = go.Figure()
for a in a_values:
    fig.add_trace(go.Scatter(x=x, y=alpha_pdf(x, a), mode="lines", name=f"a={a}"))

fig.update_layout(
    title="Alpha PDF for different a (log x-axis)",
    xaxis_title="x",
    yaxis_title="f(x; a)",
)
fig.update_xaxes(type="log")
fig.show()

# Tail view: survival function on log-log axes (highlights the ~1/x behavior).
x_tail = np.logspace(-1, 3, 600)
fig = go.Figure()
for a in a_values:
    sf = 1.0 - alpha_cdf(x_tail, a)
    fig.add_trace(go.Scatter(x=x_tail, y=sf, mode="lines", name=f"a={a}"))

fig.update_layout(
    title="Alpha survival function on log-log axes",
    xaxis_title="x",
    yaxis_title="P(X > x)",
)
fig.update_xaxes(type="log")
fig.update_yaxes(type="log")
fig.show()
x = np.logspace(-3, 2, 600)
a_values = [0.5, 1.0, 2.0, 4.0]

fig = go.Figure()
for a in a_values:
    fig.add_trace(go.Scatter(x=x, y=alpha_cdf(x, a), mode="lines", name=f"a={a}"))

fig.update_layout(
    title="Alpha CDF for different a (log x-axis)",
    xaxis_title="x",
    yaxis_title="F(x; a)",
)
fig.update_xaxes(type="log")
fig.show()

6) Derivations#

6.1 Deriving the CDF and PDF#

Start with a truncated standard normal:

\[ Z \sim \mathcal{N}(0,1) \ \text{conditioned on}\ \{Z\le a\}. \]

Its density is

\[ f_Z(z) = \frac{\phi(z)}{\Phi(a)}\,\mathbf{1}\{z\le a\}. \]

Define \(X = 1/(a-Z)\). For \(x>0\),

\[\begin{split} \begin{aligned} F(x;a) &= \mathbb{P}\left(\frac{1}{a-Z} \le x\ \middle|\ Z\le a\right) \\ &= \mathbb{P}\left(a-Z \ge \frac{1}{x}\ \middle|\ Z\le a\right) \\ &= \mathbb{P}\left(Z \le a - \frac{1}{x}\ \middle|\ Z\le a\right) \\ &= \frac{\Phi\!\left(a - \frac{1}{x}\right)}{\Phi(a)}. \end{aligned} \end{split}\]

Differentiating with respect to \(x\) gives

\[ f(x;a) = \frac{1}{\Phi(a)}\,\phi\!\left(a - \frac{1}{x}\right)\,\frac{1}{x^2}. \]

6.2 Why the mean and variance diverge#

Use the tail asymptotic \(f(x;a) \sim C/x^2\) with \(C=\phi(a)/\Phi(a)\).

  • For the mean: $\(\mathbb{E}[X] = \int_0^{\infty} x\,f(x;a)\,dx \ \text{has integrand}\ \sim C/x,\)\( and \)\int^\infty (1/x),dx$ diverges (logarithmically).

  • For the second moment: $\(\mathbb{E}[X^2] = \int_0^{\infty} x^2\,f(x;a)\,dx \ \text{has integrand}\ \sim C,\)\( and \)\int^\infty 1,dx$ diverges.

So \(\mathbb{E}[X]\) and \(\mathrm{Var}(X)\) are infinite.

6.3 Likelihood (i.i.d. sample)#

Given data \(x_1,\dots,x_n\) with \(x_i>0\), the likelihood for \(a\) (standard form) is

\[ L(a; x_{1:n}) = \prod_{i=1}^n \frac{1}{x_i^2\,\Phi(a)}\,\phi\!\left(a - \frac{1}{x_i}\right). \]

The log-likelihood is

\[ \ell(a) = -2\sum_{i=1}^n \log x_i\; -\; n\log\Phi(a)\; +\; \sum_{i=1}^n \log\phi\!\left(a - \frac{1}{x_i}\right). \]

Differentiating (using \(\tfrac{d}{da}\log\Phi(a)=\phi(a)/\Phi(a)\) and \(\tfrac{d}{da}\log\phi(u)= -u\)) gives the score:

\[ \ell'(a) = -n\,\frac{\phi(a)}{\Phi(a)}\; -\; \sum_{i=1}^n\left(a - \frac{1}{x_i}\right). \]

Setting \(\ell'(a)=0\) yields a nonlinear equation in \(a\) (because of \(\phi(a)/\Phi(a)\)), so maximum likelihood estimation typically uses numerical methods.

def alpha_loglik(a: float, x: np.ndarray) -> float:
    x = np.asarray(x, dtype=float)
    if a <= 0 or np.any(x <= 0):
        return -np.inf
    # Uses SciPy's stable norm.logpdf implementation.
    return float(np.sum(norm.logpdf(a - 1.0 / x) - 2.0 * np.log(x)) - x.size * np.log(norm.cdf(a)))


def alpha_score(a: float, x: np.ndarray) -> float:
    x = np.asarray(x, dtype=float)
    if a <= 0 or np.any(x <= 0):
        return np.nan
    return float(-x.size * (norm.pdf(a) / norm.cdf(a)) - np.sum(a - 1.0 / x))


# MLE demo in the standard form (loc=0, scale=1) via a grid.
a_true = 1.8
x = alpha_dist.rvs(a_true, size=3_000, random_state=rng)

a_grid = np.linspace(0.05, 6.0, 500)
ll = np.array([alpha_loglik(a, x) for a in a_grid])
a_hat_grid = a_grid[np.argmax(ll)]

fig = go.Figure(go.Scatter(x=a_grid, y=ll - ll.max(), mode="lines"))
fig.add_vline(x=a_true, line_dash="dash", line_color="green", annotation_text="true a")
fig.add_vline(x=a_hat_grid, line_dash="dash", line_color="red", annotation_text="grid MLE")
fig.update_layout(
    title="Log-likelihood (centered) for a (standard form)",
    xaxis_title="a",
    yaxis_title="log L(a) - max_a log L(a)",
)
fig.show()

# Compare to SciPy's fit when loc/scale are fixed.
a_hat_scipy, loc_hat, scale_hat = alpha_dist.fit(x, floc=0, fscale=1)
print("True a     =", a_true)
print("Grid MLE   =", a_hat_grid)
print("SciPy fit  =", a_hat_scipy)
print("(loc,scale)=", (loc_hat, scale_hat))
True a     = 1.8
Grid MLE   = 1.8385771543086173
SciPy fit  = 1.836523437500002
(loc,scale)= (0, 1)

7) Sampling & Simulation#

NumPy-only algorithm#

Use the truncated-normal representation:

  1. Sample \(Z \sim \mathcal{N}(0,1)\) until \(Z\le a\) (rejection sampling for a one-sided truncation).

  2. Transform \(X = 1/(a-Z)\).

Because \(a>0\), the acceptance probability is \(\mathbb{P}(Z\le a)=\Phi(a)\ge 1/2\), so this rejection sampler is typically efficient.

We implement it below using only NumPy.

def alpha_rvs_numpy(a: float, size=1, *, rng: np.random.Generator | None = None) -> np.ndarray:
    """Draw samples from Alpha(a) using only NumPy.

    Algorithm:
      - sample Z ~ N(0,1) until Z <= a (one-sided truncation)
      - return X = 1/(a - Z)
    """
    if a <= 0:
        raise ValueError("a must be > 0")
    rng = np.random.default_rng() if rng is None else rng

    size_tuple = (size,) if np.isscalar(size) else tuple(size)
    n = int(np.prod(size_tuple))
    out = np.empty(n, dtype=float)

    filled = 0
    while filled < n:
        # Oversample to reduce loop overhead.
        m = max(256, 2 * (n - filled))
        z = rng.normal(size=m)
        z = z[z <= a]
        if z.size == 0:
            continue
        take = min(z.size, n - filled)
        out[filled : filled + take] = 1.0 / (a - z[:take])
        filled += take

    return out.reshape(size_tuple)

8) Visualization#

We’ll compare:

  • the theoretical PDF and CDF

  • Monte Carlo samples (NumPy-only sampler)

  • SciPy’s implementation

a = 1.0
n = 50_000

x_np = alpha_rvs_numpy(a, size=n, rng=rng)
x_sp = alpha_dist.rvs(a, size=n, random_state=rng)

# Histogram vs PDF
x_grid = np.logspace(-3, 2, 500)
pdf_grid = alpha_pdf(x_grid, a)

fig = px.histogram(
    x=x_np,
    nbins=120,
    histnorm="probability density",
    title="Monte Carlo histogram (NumPy-only) vs theoretical PDF",
    labels={"x": "x"},
)
fig.add_trace(go.Scatter(x=x_grid, y=pdf_grid, mode="lines", name="theoretical PDF"))
fig.update_xaxes(type="log")
fig.show()

# Empirical CDF vs theoretical CDF
x_sorted = np.sort(x_np)
ecdf = np.arange(1, n + 1) / n
cdf_grid = alpha_cdf(x_grid, a)

fig = go.Figure()
fig.add_trace(go.Scatter(x=x_sorted, y=ecdf, mode="lines", name="empirical CDF (NumPy-only)"))
fig.add_trace(go.Scatter(x=x_grid, y=cdf_grid, mode="lines", name="theoretical CDF"))
fig.update_layout(title="CDF: empirical vs theoretical", xaxis_title="x", yaxis_title="F(x)")
fig.update_xaxes(type="log")
fig.show()

# Quick check that NumPy-only samples and SciPy samples look similar (KS statistic).
from scipy.stats import ks_2samp

ks = ks_2samp(x_np, x_sp)
print("KS two-sample test (NumPy vs SciPy samples):")
print(ks)
KS two-sample test (NumPy vs SciPy samples):
KstestResult(statistic=0.005519999999999969, pvalue=0.42984315222942004, statistic_location=1.0048665241757195, statistic_sign=1)

9) SciPy Integration#

scipy.stats.alpha provides the usual distribution API:

  • alpha.pdf(x, a, loc=0, scale=1)

  • alpha.cdf(x, a, loc=0, scale=1)

  • alpha.rvs(a, loc=0, scale=1, size=..., random_state=...)

  • alpha.fit(data, ...) (MLE)

A common pattern is to freeze the distribution: rv = alpha(a, loc=..., scale=...), then call rv.pdf, rv.cdf, rv.rvs, etc.

a = 2.0
rv = alpha_dist(a)  # frozen, standard form

x = np.array([0.1, 0.5, 1.0, 5.0])
print("pdf:", rv.pdf(x))
print("cdf:", rv.cdf(x))

samples = rv.rvs(size=5, random_state=rng)
print("rvs:", samples)

# Fitting (standard form): fix loc=0, scale=1 and estimate only a.
a_true = 1.5
data = alpha_dist.rvs(a_true, size=5_000, random_state=rng)
a_hat, loc_hat, scale_hat = alpha_dist.fit(data, floc=0, fscale=1)
print("\nFit (fixed loc/scale):")
print("  true a:", a_true)
print("  est  a:", a_hat)
print("  (loc, scale):", (loc_hat, scale_hat))
pdf: [0.      1.63292 0.2476  0.00323]
cdf: [0.      0.51164 0.86093 0.98651]
rvs: [0.60403 0.28935 0.38106 0.65239 2.43316]

Fit (fixed loc/scale):
  true a: 1.5
  est  a: 1.5148437500000016
  (loc, scale): (0, 1)

10) Statistical Use Cases#

Hypothesis testing (goodness-of-fit)#

If you have a specified parameter \(a\) (not estimated from the same sample), you can test whether data plausibly comes from an Alpha distribution using a goodness-of-fit test such as Kolmogorov–Smirnov (KS).

Caveat: if you estimate \(a\) from the data and then run KS on the same data, the usual p-values are no longer exact (you need a corrected procedure or a bootstrap).

Bayesian modeling#

Because the likelihood \(p(x\mid a)\) is available in closed form, you can put a prior on \(a>0\) (e.g., Gamma) and perform Bayesian inference with generic tools:

\[ p(a\mid x_{1:n}) \propto p(x_{1:n}\mid a)\,p(a). \]

Below we show a simple grid-based posterior computation.

Generative modeling#

The Alpha distribution can be used as a heavy-tailed positive noise model or as a component in a mixture model when you want strictly-positive data with occasional extremes.

Because the mean is infinite, summary statistics and loss functions that rely on the mean (e.g., squared error to the mean) can behave poorly; robust summaries (median/quantiles) are often more appropriate.

# Hypothesis testing example: KS test when a is known.
from scipy.stats import kstest

a = 1.2
x = alpha_dist.rvs(a, size=2_000, random_state=rng)

D, p_value = kstest(x, alpha_dist(a).cdf)
print("KS test against Alpha(a=1.2):")
print("  D      =", D)
print("  p-value=", p_value)
KS test against Alpha(a=1.2):
  D      = 0.018166844743858906
  p-value= 0.5181148318909714
# Bayesian modeling example: grid posterior for a with a Gamma prior.

from scipy.stats import gamma as gamma_dist

rng_local = np.random.default_rng(123)
a_true = 1.8
x = alpha_dist.rvs(a_true, size=800, random_state=rng_local)

# Prior: a ~ Gamma(k, theta) with support (0, inf)
k, theta = 2.0, 1.0

a_grid = np.linspace(0.05, 6.0, 800)
log_prior = gamma_dist(a=k, scale=theta).logpdf(a_grid)
log_like = np.array([alpha_loglik(a, x) for a in a_grid])

log_post_unnorm = log_like + log_prior
log_post = log_post_unnorm - np.max(log_post_unnorm)
post_unnorm = np.exp(log_post)
post = post_unnorm / np.trapz(post_unnorm, a_grid)

a_map = a_grid[np.argmax(post)]

fig = go.Figure(go.Scatter(x=a_grid, y=post, mode="lines"))
fig.add_vline(x=a_true, line_dash="dash", line_color="green", annotation_text="true a")
fig.add_vline(x=a_map, line_dash="dash", line_color="red", annotation_text="MAP")
fig.update_layout(
    title="Posterior over a (Gamma prior + Alpha likelihood)",
    xaxis_title="a",
    yaxis_title="posterior density",
)
fig.show()

print("True a =", a_true)
print("MAP    =", a_map)
True a = 1.8
MAP    = 1.7925531914893618
# Generative modeling example: heavy-tailed positive "durations".
# In practice, prefer robust summaries (quantiles) over the mean.

a = 1.0
durations = alpha_rvs_numpy(a, size=80_000, rng=rng).ravel()

print("Summaries (a=1.0):")
print("  median =", float(np.median(durations)))
print("  mean   =", float(durations.mean()))
print("  99%    =", float(np.quantile(durations, 0.99)))
print("  max    =", float(durations.max()))

# Empirical CCDF on log-log axes; the far tail is close to ~const/x.
x_sorted = np.sort(durations)
n = x_sorted.size
ccdf = 1.0 - np.arange(1, n + 1) / n

x0 = float(np.quantile(x_sorted, 0.9))
mask = x_sorted >= x0
x_tail = x_sorted[mask]
ccdf_tail = ccdf[mask]

# Reference ~c/x line anchored at x0 using the empirical CCDF.
c0 = float(ccdf_tail[0] * x_tail[0])
ref = c0 / x_tail

fig = go.Figure()
fig.add_trace(go.Scatter(x=x_tail, y=ccdf_tail, mode="lines", name="empirical CCDF (tail)"))
fig.add_trace(
    go.Scatter(
        x=x_tail,
        y=ref,
        mode="lines",
        name="~c/x reference",
        line=dict(dash="dash"),
    )
)
fig.update_layout(
    title="Generative example: empirical tail on log-log axes",
    xaxis_title="x",
    yaxis_title="P(X > x)",
)
fig.update_xaxes(type="log")
fig.update_yaxes(type="log")
fig.show()
Summaries (a=1.0):
  median = 0.8381684906603133
  mean   = 4.219827793554541
  99%    = 30.191825495643975
  max    = 34695.03256416532

11) Pitfalls#

  • Invalid parameters: \(a\le 0\) is not allowed; the support is \(x>0\) in the standard form.

  • Infinite mean/variance: sample means can look “reasonable” for small \(n\) but are not stable estimators; prefer medians/quantiles.

  • Numerical issues in the tail:

    • use logpdf when multiplying many densities or when \(x\) is extreme;

    • ppf(q) for \(q\) extremely close to 1 can overflow because the true quantile is enormous.

  • Fitting sensitivity: because extreme values occur, MLE can be sensitive to outliers and optimizer settings; consider robust diagnostics (QQ plots, tail checks) and bootstrap uncertainty.

12) Summary#

  • The Alpha distribution (scipy.stats.alpha) is a continuous distribution on \((0,\infty)\) with shape parameter \(a>0\).

  • It can be generated by sampling a truncated normal \(Z\mid(Z\le a)\) and transforming via \(X=1/(a-Z)\).

  • Its right tail behaves like \(\text{const} \cdot x^{-2}\), implying \(\mathbb{E}[X]=\infty\) and \(\mathrm{Var}(X)=\infty\).

  • PDF/CDF/PPF have clean expressions in terms of the standard normal \(\phi\) and \(\Phi\).

  • Sampling is straightforward with a NumPy-only rejection sampler for the one-sided truncated normal.

References#

  • Johnson, Kotz, and Balakrishnan. Continuous Univariate Distributions, Volume 1 (2nd ed.), Wiley, 1994.

  • Salvia, A. A. “Reliability applications of the Alpha Distribution.” IEEE Transactions on Reliability, 1985.

  • SciPy documentation: scipy.stats.alpha.